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Abstract. The paper studies large sample asymptotic properties of the 
Maximum Likelihood Estimator (MLE) for the parameter of a continu- 
ous time Markov chain, observed in white noise. Using the method of 
weak convergence of likelihoods due to I.Ibragimov and R.Khasminskii 
[14) . consistency, asymptotic normality and convergence of moments are 
established for MLE under certain strong ergodicity conditions on the 
chain. 



1. Introduction 

1.1. The setting and the main result. Consider a pair of continuous 
time random processes {S,X) = {St, Xt)t>o, where 5 is a signal Markov 
chain with values in a finite real set § = {ai, ad} and X is given by 



Xt= [ h{Sr)dr + Bt, 
Jo 



with an S M function h and a Brownian motion B, independent of S. 
Let A = (Xij), i,j E {1, ...,d} and be the transition rates and the initial 
distribution of the chain respectively. Suppose the model, i.e. A and h, 
depend on a parameter 9 £ with 0, being a bounded open subset of M"", 
which is to be estimated given the observed trajectory X"^ = {^^,0 < s < 
T}. 

In this paper we study the large sample asymptotic properties of the Max- 
imum Likelihood Estimator (MLE) 9t of 6 given X^ . For a fixed value of 
the parameter, let Pg denote the probability measure, induced by (5, X) on 
the corresponding function space -D[o,oo) ^ C'fo.oo)! and let be the natural 
filtration of X. Introduce the filtering process vr^ = (vrf )(>o with values in 
the simplex of probability vectors = {x G R'^ : > 0, ^^^^ Xj = 1}, 

whose entries are the conditional probabilities {Trf}^ := Pe{St = ai\^^^. 
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As is well known, the process 

Bt = Xt- / h*-Kldt 





is the innovation Brownian motion with respect to and by the Girsanov 
theorem the likelihood, i.e. the Radon-Nikodym derivative of Pg, restricted 
to with respect to the Wiener measure on C[o,t]) is given by 

LT{e- X^) := exp |^ h*TTfdXt -\ {h*TTffdt 

where h is the vector with entries hi = h{ai), i = 1, d and h* denotes its 
transposed. We shall define the MLE 9t as a maximizer of the likelihood: 

9t := argmaxgggLT(6'; X^) (1.1) 

where © stands for the closure of ©. If A and h are continuous in 9, 
Lt{9;X'^) is a continuous function of ^ on with probability one and 
hence the maximum value is attained, perhaps at multiple values of 6, in 
which case any maximizer is chosen. 

In fact, for any 9,r] £ & the restrictions of and on are equivalent 
(see e.g. [23]) with the corresponding likelihood 

and for any 77 G 

(9r = argmaxgg^LT(6',r/;X^). (1.2) 

The latter expression is more convenient for the analysis purposes and, in 
fact, we shall work with ()1.2p . fixing r] := 9q, where 9q is the actual (un- 
known) value of the parameter. This choice is quite natural as we study 9t 
under measure Pq^. 

To simplify the presentation, we shall consider the case of scalar param- 
eter, i.e. C M, and, moreover, assume that h does not depend on (this 
issue is briefly addressed in Section [5|) . Our main result is the following 
theorem. 

Theorem 1.1. Assume 
(a-1) \ij{6) are twice continuously differentiahle on O and 

minmin Aj,(0) > 0; (1-3) 

(a-2) the model is identifiable in the sense that the function g(9Q,9) := 
Eeo(/i*-7rQ — h*TtQ'^) , where {t^q,t^q^) are random vectors, sampled 
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from the unique invariant measure^ of the Markov process (7rf,7r^") 
under P^q, satisfies 

inf inf g(eo,e)>0, Vr > (1.4) 

for any compact K C 0; 
(a-3) the Fisher informatiorQ 



m) := lim 1 r {h*^t')'dt 

T^OO 1 Jr. 



is well defined (as the unique limit in -probability), is positive 
and, moreover, 



lim E9o(/i*7^f')^ = 

uniformly on compacts K. C 0. 
Then the MLE 9t is uniformly consistent: 



lim sup ( — ^0 ^ e 

uniformly asymptotically normal, i.e. 

lim sup E0^j(VT(eT - 6o 



0, Ve > 0, 



E/(e) 



0, V/ G Cb 



with a zero mean Gaussian random variable ^, with variance \/I[6q). More- 
over, the moments converge: 



lim Eg,, 
1 — >oo 



E|^|P, Vp>0, 



uniformly over compacts IfC C 0. 

Several remarks are in order 

Remark 1.2. The condition (ja-l|) implies that the chain S is ergodic, but it is 
an excessively strong requirement as far as just the ergodicity is concerned: 
in fact, S is ergodic if and only if all its states communicate (or equivalently 
the entries of the matrix exponent exp (A) are all positive), (ja-ip plays a 
decisive role in the proof, as it implies appropriate ergodic properties of the 
filtering process vr^ = (vrf )t>o. 

The assumptions (ja-2p and (ja-3|) are of identifiability and regularity type 
and should be checked on the case-to-case basis. In Section d] this is demon- 
strated with an example, where both are verified explicitly in terms of the 
model data. 



"'^(tt^ , TTj" ) is indeed a Markov process and it has a unique invariant measure under (|a-l 
see Lemma [3.61 below 



here -k^ := ^tt* in the PeQ-a.s. sense: such derivative exists, when 6 i— > A.{6) is 
continuously difFerentiable. 
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Remark 1.3. The calculation of 9t can be quite an involved numerical opti- 
mization problem, which we do not discuss here. Let us just mention that an 
effective iterative EM procedure for finding a local extremum of Lt{0',X'^) 
was suggested in [6],[3T] (see also the monograph ^ for additional details). 
However, its convergence to the actual value of 9t, i.e. to the global mini- 
mum, remains vague. 

1.2. Continuous versus discrete time. The interest in parameter esti- 
mation problems with partial observations can be traced back at least to the 
works of L.E.Baum and T.A.Petrie [1], [27], who verified consistency of MLE 
for discrete time models with both signal and observation processes taking 
finite number of values. The question is very natural in the context of many 
engineering problems (see e.g. [9], 0], [I^). The next major advance has 
been made by B.Leroux in [22], where the observation process with general 
state space was assumed and consistency of MLE was verified under quite 
general assumptions. A partial extension to the signals with general state 
spaces was recently reported in |12] . Spelled in our notations, the main idea 
is to consider the limit 



where H(9, Bq) is the KuUback-Leibler relative entropy rate between the re- 
strictions of Pg and P^o on . If the system is identifiable, H{d, 9q) attains 
its unique minimum at ^ = and consistency follows. It is the convergence 
in (jl.Sp and the verification of the identifiability conditions, which turn to 
be quite challenging matters. 

The asymptotic normality was established in [3] and the extension to 
signals in general spaces followed in |15j . Roughly, the idea is to expand 
the likelihood function into powers of the estimation error 9t — 9q, which 
vanishes as T — > oo by consistency, and the proof then amounts to verifying 
the appropriate convergence of various residual terms. In continuous time 
the direct implementation of this procedure is quite nontrivial as it requires 
substitution of the anticipating random variable 9t into the first argument 
of Lt{9,9o; X'^), which involves the Ito integral. Though, in principle, such 
treatment is possible within the framework of Malliavin calculus, it would 
be, perhaps, excessively technical. 

We shall prove Theorem 1 1.1 1 by realizing the program developed by I.Ibra- 
gimov and R.Khasminskii in early 70's [H]. The main idea of this approach 
is to deduce the asymptotic properties of MLE from the weak convergence of 
the appropriately scaled likelihoods, viewed as elements in a function space 
(more details are given for the reader's convenience in Section [2] below). 
When applied to the large sample asymptotic problems, this method typi- 
cally requires good ergodic properties of the related processes (see e.g. the 
monograph [19]) - in our case, the filtering process vr^ = (7rf)f>o. While for 
the Kalman-Bucy linear Gaussian models, such ergodic properties are long 




(1.5) 
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known and are implied by stability of the associated Riccati equation, the 
nonlinear case has been studied only during the last decade (see e.g. an 
already not quite up to date list of references in [2]). The role of the ergodic 
properties of the filtering process in MLE analysis context (in discrete time) 
has been first recognized in [20] (see also [21]) and developed further in [7], 
[1] (see also |26j for a different approach). 

The inference of stochastic processes in continuous time is natural in 
e.g. mathematical Finance, where the asset prices are thought as positive 
diffusion processes, such as geometrical Brownian motion, etc. Though in 
practice the inference is made from observations, obtained by sampling the 
prices at discrete times, the analysis of estimates, based on the continuous 
time observations is of practical interest, as it may hint to the fundamental 
performance limitations of the model. 

The large sample asymptotic properties of MLE for continuous time mod- 
els with partial observations seem to have never been addressed, beyond the 
linear Gaussian Kalman-Bucy setting (see [18] or e.g. Section 3.1 [19] for 
prototypical examples and the references therein). 

Besides of being conceptually appealing in its universality, the Ibragimov- 
Khasminskii approach allows to derive stronger properties of MLE, namely 
the convergence of moments. To the author's understanding, the latter was 
not yet addressed even for the discrete time HMMs. 

As was mentioned before, the computational aspects of MLE have at- 
tracted more attention: e.g. the EM algorithm was implemented in [6l [3T] 
and [9] for the setting, considered in the present paper. Some results on 
recursive parameter estimation for partially observed diffusions appeared in 
[23] and [H]. 

Below, in Section [21 we proceed with a brief reminder of the Ibragimov- 
Khasminskii approach. Section [3] contains the proof of Theorem 11.11 and 
Section H] presents an example for which the conditions of Theorem 11.11 are 
verified explicitly. Finally, a concise discussion of the results is given in 
Section [5l 

Notations and conventions. Throughout Ci or Cjj, i,j G {1,2,...} de- 
note generic constants, whose precise value is not important and, moreover, 
may be different depending on the context (e.g. in separate claims, proofs, 
etc.). We shall write for the i-th entry of the vector x. All the state- 
ments, involving random objects, are understood to hold in the Pgg-a.s. 
sense, if not mentioned otherwise. 

Acknowledgements. I would like to express my gratitude to Yury Ku- 
toyants, who suggested the problem and whose advice was crucial for the 
progress towards its solution. I would also like to thank Marina Kleptsyna 
for many enlightening discussions and her interest in this work. I am in- 
debted to them for their hospitality, without which my stay in France would 
never have been the same. Correspondence with Ramon Van Handel was 
essential at various stages of this project and is greatly appreciated. 
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2. The Ibragimov-Khasminskii program 



The main idea of I.Ibragimov and R.Khasminskii, [T3], is to consider the 
sequence of scaled Hkehhoods 

Zt{u) := Lrieo + u^t, Oq; X^), n G Ut := (0 - eo)/^T, 



where ipT is an appropriate scahng function (in our case (pT = 1/VT), as 
elements from the space Co of continuous M M functions, vanishing at 
iboo, with the norm = supy^^ \'^{y)\- As Zt{u) is defined only on Ut, 
its definition is extended to M to make it an element of Cq, in such a way 
so that its supremum remains unaltered. 
For a measurable set ^ G R 

?e, {VTiOT - eo) eA) = Pe, [Ot g A/Vf + Oo) = 
PeA sup LT{r],eo;X^)> sup ^o; ) = 

Pqq smpZt{u) > sup Zt(u) . 

Suppose that the sequence of random processes u ^ Zt{u), T >0 converges 
weakly in the function space Cq to a random process Z{u) and assume Z{u) 
attains its maximum at a unique point u, which has a continuous distribution 
(e.g. Gaussian). Then, as supremum is a continuous functional on Cq, we 
have 

Peo{VT{eT - Bo) G A) Pe, ( supZ(n) > supZ(^x) | = Pg^{u G A), 

VuGA u^A I 

In other words, the asymptotic distribution of the scaled estimation error 
VT{6t — 0o) converges to the law of u as T — > oo. The following theorem 
gives the precise conditions required for realization of this idea: 

Theorem 2.1 (Theorem 10.1 |14j). Let the parameter set be an open 
subset o/M, functions u Zt{u) be continuous with probability 1 possessing 
the following properties: 

(1) For any compact IfC C 0, there correspond numbers a and B and 
a positive function g{u), such that limu_»oo lul^e"^*-"^ = for any 
integer N , such that 

(a) there exist numbers a > 1 and m > a such that for 9q ^M. 
sup \U2 - ni|-°Eej4/'"(n2) - 4^™(^xi)|™ < 5(1 + 

hil<-R,h2l<-R 
ui,U2&Vt 

(b) For all u G Ut and Oq G K, 

Ee,^/Z^) < e-3^\^\\ 
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(2) Uniformly in 6q £ for T ^ oo the marginal (finite- dimensional) 
distributions of the random functions Zt{u) converge to marginal 
distributions of random functions Z{u) where Z G Cq. 

(3) The limit functions Z{u) with probability 1 attain the maximum at 
the unique point u 

Then uniformly in 6q dM. the distribution of random variables VT{9t — Oq) 
converge to the distribution ofu and for any continuous loss function w with 
polynomial growth we have uniformly in 0q gM. 

lim Ee^^w{VT{eT - ^o)) = ^w{u). 

The continuity condition (jlap and the large deviations condition (jlbp for 
the Ukehhoods tails give tightness of the probability measures, induced by 
Zt{u), while the convergence of finite dimensional distributions ([2]) identifies 
the limit, yielding the aforementioned weak convergence. 



3. The proof 

The proof reduces to verifying the conditions ([I])-® of Theorem II . II and 
is preceded by several important preliminaries. The reader unfamiliar with 
the material, sketched in the previous section, is advised to look first at the 
section [3^ below to see how various propositions are applied. 

3.1. Preliminary results. The filtering process vr^ satisfies the Shiryaev- 
Wonham equation ([28], [30], see also [M]): 

d7rf° = A*7rf'dt + {TTt^TT^"* - diag{7rf°)) h{dXt - h*Trf''dt), 7r^« = u, (3.1) 

where diag(x) denotes a scalar matrix with x G M'^ on the diagonal and h 
stands for a column vector with the entries h{ai), i, d, as before. Having 
Lipschitz coefficients, this equation has a strong solution under Pg^ as well 
as under Pq, 9 9q. Under Pq^, the process ttj" is Markov, since 

Bt ■.= Xt- [ h*Jjs, t > (3.2) 
Jo 

is the innovation Brownian motion with respect to the filtration . 

Further, let TTg°f{x) be the solution of (|3.ip on the interval [s,t], started 

at s from x G S'^~^ {S'^~^ is the interior of S'^~^). The map x i-^ T:f°{x) 
defines a stochastic semiflow of smooth diffeomorphisms (see Lemma 2.4 in 
[5]), which means that on a set of full probability, it is a smooth injective 
function of x, satisfying the semigroup property ttq^ = 7r^° o ttq"^. We shall 
also keep the shorter notation n^^ = 7rg° (z/), whenever appropriate. The 
following facts are central to all the arguments below. 
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Proposition 3.1 (Proposition 3.5 in [5]). Assume Xij{9) > 0, i ^ j, then 
for -measurable random variablesu Ml; £ S'^~^ 

\Km-<m\\ < 



max (l/{/ii}„l/{/i2}i)||/ii -M2||e-^(^)(*-^\ P^o - a.s. (3.3) 

i=l,...,d 

where 



7(0) := 2 min J Xp,{9) Xgp{e) . (3.4) 

Remark 3.2. Sometimes it will be more convenient to use the bound (Corol- 
lary 2.3.2 pp. 59 in [29]) 

- < 2e-^(^)(*-), P,„ - a.s. (3.5) 

Let Dvr^ j(/i) • v be the directional derivative of vrf ^(•) at a point /i G S'^~^ 
in the direction v G TS'^~^ (the tangent space to S'^~^). 

Proposition 3.3 (Proposition 3.3 in [3]). For any fi G 5"^"^ anc? v G TS'^~^ 
{D7rlt{fi)-v}^ = {<t(/u)}^^^{<,(//)},<^.,j(i,j,fc), Peo -a.5. (3.6) 

where ips,tihj,k) is a random process with the property 



Finally we have the following formula, 



max I 



Proposition 3.4 (Proposition 2.6 in [5]). For any fi ^ S"^ ^ , 

- vro^« (/.) = r Z)<,(^f«) • {A*{9) - A*(0o))vrf»ds. (3.7) 



Remark 3.5. The statements of all the three propositions remain valid if 9 
and ^0 ai'e interchanged. Let us also emphasize that anything stated Pg^-a.s., 
holds Pg-a.s. as well and vice versa. 

First we justify the definition of ^'(^O) 9) in (la-2p of Theorem ll.il 

Lemma 3.6. The pair (vrf\ vrf ) is a Markov process under Pg^ and it has a 
unique invariant measure Ai . For any Lipschitz f with J fdM. = 

|E,J(7r,^o,7rf)| <Ce-^^(^")^^W*, (3.8) 
with a constant C and 7(-), defined in (13. 4p . 

Proof. The filtering equation (|3.1|) has a unique strong solution, subject to 
vtq" = u' for any zv' G S'^~^. If z^' coincides with u, the actual distribution 
of So, then the corresponding solution vr^" is the conditional distribution 
of St, given and thus the innovation process Bf = Xt — /q h*TTg°ds is 



throughout || ■ || stands for the £i norm unless stated otherwise. 
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a Brownian motion with respect to . Consequently, vr^" is a Markov 
process and, since vr^ satisfies 

d4 = A*7Ttdt + (vrfvrf - diag{7rf))h{dBt + {h*4° - h*7rf)dt), (3.9) 

the pair (7r^",7rf) is Markov as weh. Since both processes solve SDEs with 
Lipschitz coefficients, this pair is also a Feller process (see e.g. Theorem 3.2 
in Ch. Ill [I?]), and since it evolves on a compact state space, at least one 
invariant measure exists (e.g. Theorem 2.1 Ch. Ill, |17|). 

We shall argue for the uniqueness, by showing that if two measures M and 
A4 are invariant, then J (j)dM = J (j)dM for any bounded and continuous 
(j) and thus M and M coincide. For these purposes, we shall explicitly 
construct the corresponding stationary processes and flows. 

Let (p, q) be a random variable with values in S'^~^ x S'^~^ and distribution 
M. Introduce an § valued random variable Sq with conditional distribution 
P{Sq = ai\p,q) = Pi, i = l,...,d. Further, let S be a Markov chain with 
transition rates matrix A{6q) and random initial state Sq and define the cor- 
responding observation process X := h{Sr)dr + Bt. Finally, let (7r^°,7rf) 
be the solutions of (13. ip and (13. 9p . started from p and q, respectively, where 
Xt is replaced with Xf Then 7r^° is nothing but the vector of conditional 
probabilities Pg^ [St = ai\^^ \/a{p}) , i = 1, ...,d and thus the corresponding 
innovation Xt — Jq h*Tt^'^dr is a Brownian motion with respect to the filtra- 
tion V (y{p}. Hence (tTj" , vrf ) is a Markov process and it is stationary by 
construction. 

The stationary process (7f^°, vrf ), corresponding to M, is defined similarly, 
but using a Markov chain 5, coupled with S. Namely, following e.g. |13j . 
one can construct a Markov chain [St, St) on § x S, such that both St and 
St are Markov chains on their own, with the transition rates matrix A(^o) 
and initial distributions fi and /t respectively and, moreover, St = St for any 
t>T, where r = inf \t : St = ^t}, is the coupling time, satisfying 

Pg^{j > t) = e~™™^*^ {\A^o)+^ji{do))t < ^-"/(9o)t_ (3.10) 

The observation process X := h{Sr)dr + Bt is defined, using the same (!) 
Brownian motion as in the definition of Xt. Finally (7f^°,7rf) denote the 
solutions (13. ip and ()3.9|) . driven by X and started from p and q, respectively. 

The main point of this arrangement is that after the coupling time the 
increments of the observation processes Xt and Xt coincide and hence on 
the set {r < s} 

7r5(.) = <i(-) and <,(•)=<,(•), Vt > . 
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with probability one. Then for any t > s >0 (w.l.o.g. \(p\ < 1 is assumed) 
JhM- J HM = |Eeo0(7r^,7rf) - E,„0(7r^, vrf) | < 

+ EeJ.^(^f',vrf)-<^(7r^,7rf)|l|,<,|< 

2P,„(r >s) + E,J</.(7rJ,(^f''),<t(^?)) " ./.(7rji(^f°), <t(7rf )) |l|.<,} < 

2P,„(t > .) + E,„|0(7rf°(vrf"),<,(vrf)) - 0(vrf° (vffo), <,(*f )) | . 

The latter can be made arbitrarily small, by taking s and then t large enough 
and using (j3.10p and (jS.Sp along with continuity of 4>. This verifies unique- 
ness of the invariant measure of {iT^",7r^). 

The bound (jS.Sp is derived similarly. We couple the chain S (with initial 
distribution v) to the stationary S, which ensures that on the set {r < s}, 
the corresponding flows 7rf° (•) and 7rf° (•) coincide. But then for any t > 
(below Lf denotes the Lipschitz constant of /) 

|Eej(^f«,7rf)| = |E,j(vr,^°,vrf) - E,j(^f', vrf ) | < 
2Peo(T>t/2)+ 

2Peo(T>t/2)+ 

LfVtkMh) - <%A<%)\\ + LfhmMn) - ^2,(42)11 < 

with a constant C > 0. □ 

The combination of the formulae (j3.6p and (13. 7p , involves 1 / vrf ° , which is 
P^Q-a.s. bounded on any finite interval (see Corollary 2.2 in [5]). However, 
under assumption (jl.3p . vrfo is repelled from the boundary of 5*^ ^ strongly 
enough to guarantee the following uniform integr ability: 

Lemma 3.7. Assume (jl.3p . then for any /i G S'^~^ 

(1 \ 
— r en. M < ?" = 1>2,... (3.11) 
|7r,0t(^)U 

uniformly over 6q £ 0. 

Proof. The proof follows the arguments of Proposition 3.7 in [5], which veri- 
fies (|3.1ip for m = 1. As the equation (|3.ip is time homogeneous, no general- 
ity is lost if we assume s = (and use the shorter notation vrj := {7rQ"j(^)| .). 
By Lemma 3.6 [5], for any m = 1, 2, ... and T > 

f-T 



-So 



I) dt < oo. (3.12) 
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By the Ito formula 



m{m + l)(7rl)-™(/i*7r, - h^f^s - / m(7r^)-'"(/iV, - h')dBs 





Set := Egg^nl)""^ , then by the Jensen inequahty 

By p.l2p . the expectation of the stochastic integral vanishes and, since 
minj^i Xji > is assumed, we have 

^Mt<-K^M|+'/"' + K2Mt 
at 

with constants Ki > and K2. For any fixed m, this differential inequality 
implies supj>o Mt < oo, which is nothing but (13. lip . □ 

Remark 3.8. Clearly, the statement of the lemma remains valid for tt^ , 9 ^ Oq 
i.e. ^ 

supEe„(— — ( \ , I < oo, m = l,2, ... 
t>s ymm./ K J (/i)|.y 

uniformly over 9,9q £ 0. 

The following lemma is an extension (in the case of unperturbed h) of 
Theorem 1.1. from [5]: 

Lemma 3.9. Assume ()1.3p . then for any /i G 5*^"^ and uniformly over 
sup E0j^,^«(^) II" <C|0o-^r, m = l,2,... (3.13) 

t>s 

with a constant C > 0, possibly dependent on m. 
Proof. Using dMI) and (IHTD . 

E0oikS(^*) ~ ^f,t(M)ir < 

miufc [nl^rilj)] 



Ci||A(0o)-A(^)|r"E,., ( / . , I < 



Ci||A(0o) - A(^)|ry-"^(^) / e"^W(*"^)E,„ . , dr < 

;^||A(^o)-A(^)|r<C|0o-^r, 
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where f is the Jensen inequahty and the last bound is vahd as A(^) is 
continuously differ entiable on 0. □ 

Finally we shall need the following law of large numbers: 
Lemma 3.10. Under the assumption (la-lh . 



Tk fc = l,2,... (3.14) 

with constants Ci and C2, possibly dependent on k. 
Proof. Let 54(^0,^) := Eeo{h*TT^t° - /^*vrf)^ then 

Ee, i^- {h*4' - h^-nlfdt - g{eo, 9)j < 

-| {{h*4^^ -h*4f -gt{e,,e))dt^ + 

/ -\ f-T \ 

2^'"U- (gtieo,e)-g{eo,e))dt] . (3.15) 







The second term in the right hand side of (|3.15|) contributes C2/T'^^ in 
(j3.14|) . since by (|3.8|) . gt{9o,0) converges to g{6Q,9) exponentially fast. The 
contribution of the first term in (|3.15|) is deduced from a version of Lemma 
2.1 in [TB]. In particular, this lemma implies that if a zero mean process <I>f, 
has a bounded moment of order 2k + 6 for some 5 > and is a strong mixing 
with the coefficient Q;(r), decaying to zero sufficiently fast as r — > oo, then 



^fdt \ < CT^, 



T N 2fc 



with a constant C > 0, depending on the moments of $t. This is precisely 
the type of estimate needed for ()3.14|) . however, it is not clear whether 
(7rj°,7rf) is a strong mixing. Note that (j3.3p (with 6, replaced by ^o) does 
not necessarily imply that vr^" is a strong mixing, as it does not even guaran- 
tee that the distribution of vr^'' converges to the invariant measure in total 
variation norm (only weak convergence follows). Fortunately, the strong 
mixing property is not crucial for the claim of this lemma and it can be 
modified to suit our purposes. The exact formulation of an analogous state- 
ment, namely Lemma lA. 11 and its proof are given in Appendix lAl 

We aim to apply Lemma lA.ll to the process ^{t) := [h*-Kf — /i*7r^°) — 
gt{9Q,9). By the definition Egg<I>(t) = and by Lemma [3T9l the condition 
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foil is satisfied with b := (Oq - Of. So to prove 
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2k 



< 



c\eo-e\ 



4k 



-, (3.16) 



we shall show that ()A.2p holds, i.e. for any n > 2 

\Ee,Mti):Mtn) - E$(ti)...$(t,)E$(t,+i)...$(t„)| < Cnb''a{ti+i-ti 

with exponential a{T). 

By the formula (j3.7p (with 6 and interchanged), for s <t 



r d4\{4) ■ {A*ieo)- A* (9)) 4dr+ 
Jo 

^*i^7r5«,(7rf)) • {A*ieo) - A*(0))<,(^f)dr := /* + J,,,(vrf 



Recall that the pair (tt^'^jTt^) is a Markov process under Pe^ and let ^ 
denote its natural filtration. Using (j3.6p and (jS.lip . we get 



^6*0 I 



l/m 



< 



C,||A(.„)-A(.)|r"E„„(£— 1^ 



< C2\eo-9\e 



-7(eo)(i-s) 



-7(eo){t-r)^^ 



l/m 



, m = l,2, ... (3.17) 



where the latter inequality is deduced as in the proof of Lemma [3.9[ Similarly 
we have 



l/m 



<C3\eo-e\. 



E9o||>^s,t(vrf)|P 
Further, for any /-fi,/i2 G S'^~^, 

Js,M - Js,M = f D4:t{<rM) ■ (A*(eo) - A*{9))7rlMdr- 

D4%7rlM) ■ (A*(eo) - A*(^))<.(/i2)dr = 

i?<°«,(m)) • (A*(^o) - A*(0))«,(Mi) -<,(/i2))tir+ 

(D<»«,(/ii)) - «,(m2))) • (A*(0o) - A*{e))7rlMdr 
:=R + Q 



(3.18) 
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The bound ()3.5p and the formula ()3.6p yields for any m = 1,2, ... 

1/m 



^9o\\R\r 



<C4||A(0o)-AW 
1 



7(eo){t-r)g-7W(s-r)^^ 



1/m 



< 



minfc{7rf_,,(/ii)}fc 

Csl^o - ^1 exp { - i[7(0o) A 7W](t - s)}. 
Using (j3.6p (and utilizing its particular dependence on ^u) and (j3.5p 

/ \ 1/m 

(E^ollQir 



<C6||A(0o)- AW||x 



-00 



1 



1 



-7(0o)(i-r-)g-7(9){r~^*)^^ 



1/m 



minfc{7r^,,(/X2)}fc minfc{7r^,.(/xi)}fc 

< Crl^o - ^1 exp { - i[7(0o) A j{9)]{t -s)}. 

Hence, for any /ii,/i2 £ 5*^"^ 



Eeol|JM(w)-^Ma*2)||" 



1/m 



< 



Csl^o - 01 exp { - i[7(0o) A 7(0)](t - s)} (3.19) 

Below we shall use the fact, that if (,i,...,(,m are random variables (de- 
pending on a parameter b > 0), such that (E|,^j| j < Cj^^fc for any k > 1, 
then by the Holder inequality for integers ki, k^ 

^^il^'.-lUf"' < C|6|'=i+-+'='", (3.20) 

with a constant depending on /cj's and m. 
For any s < t, 

<^t = - h*7T^,'f - gt{eo,e) = {h*ll + r J,,t(7rf))' - gt{eo,e) = 

h*il [h*il + 2h*JsA<)] + {h*JsAOy - 9tieo, 9) 

The inequalities (|3.20p and (|3.17p imply 

E^o^* := E,3r/*[r/* + 2r J,,t(7rf)] < Cgl^o - 9\'e-''^'°'^^'-'\ (3.21) 
Further, 

Ee„<i>t,...<i>jE,„(<I>i,^,...cI>,J^-)-E,„$i,^,...<I>i„) . (3.22) 
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Substituting = i^tl + 4'u,tj{'^ti) j = i + expanding all the 

expressions into monomials and using the bound (I3.2ip . we see that the 
right hand side of (13.22P is bounded by a sum of terms of the form dj \ 9q — 
^|4ng-7(9o){tj-ti)^ j = z + (and hence altogether bounded by 016*0 — 

Q\in^—i{eo){ti+i-u) fQj. gQ^g C> 0) and the term 



£oM,...^tA^eMta.U<)---<l^ta.A<)\'^t:)- 



< 



Eeo I 1 • • • ^t, I E^o I ,u+i i'^u)- "t^u (4 ) - 

<t^UM+.K)-4u,uA^l)[ (3-23) 

where Eg^ denotes expectation over an auxiliary probability space {p., P), 
on which vf^ is defined as a copy of vr^. 
Using the elementary summation formula 

h{x)...Ux) - h{y)-Uy) ={h{x) - h[y)]h{x)...Ux)+ 

h{y){f2{x)- f2{y)]h{x)...fn{x)+ 



fi{y)-fn-i{y){fn{x) - fn{y)}, 



and the bounds (j3.18|) . (j3.19p and p.20p . we conclude that the expression 
in (j3.23p is bounded by a sum of terms of the form 



Ci,j\Oo - 
and hence by 



' exp 



^[7(eo)A7W](i,-t.)}, i 



i + 1, ...,re 



C|^o - e|'"exp { - i[7(^o) A 7(e)](ti+i - U)\ 



with some C > 0. This verifies the condition ()A.2p of Lemma lA.H which 
yields (j3.16p and in turn the required bound (|3.14p . □ 



3.2. The proof of Theorem ll.il The proof verifies the conditions of The- 
orem [2?T] and follows the lines of the proof of Theorem 2.8 in [19] with the 
adjustments, based on the properties, derived in the preceding section. 



Lemma 3.11. Assume (|a-ip of Theorem then (llap of Theorem \2.1 
holds for any even m > 2. 
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/ !^ \ \ 1/2A: 

Proof. For an integer A; > 1, define Vr '■= { I , then 



Zt{ui) 



Ee,ZT{ui) (1 - Vrf' = Egr (1 - Vrf^ , 



where the notation = + ^^i / VT is used for brevity. Recall that 



Zt{ui) [Jo ^ J 2 



\2 



where B = {Bt)t>o is the innovation Brownian motion under Pqt . Let 

^1 

gT gT 

6t ■= /i*7rj"^ — h*'K^^'^ , then by the Ito formula 
and hence 







2k 



ClT^-' r EgT {Vt5t)"'dt + C2T^'~' r EgT {Vt5}f'dt = 

Jo Jo 
CiT^-i Te^t (5i)''dt + C2r2^-i Te^t {6t)"'dt< 

Jo ^0 

C3{ui-U2f'' + Ci{ui-U2)'"', 

where the bound p.lSp has been used in the latter inequality. This implies 

{ui - u2)-^'Eg, [Zriuiy/^'^ - Zt{u2)'/"') < C3(1 + R^'), 
with a constant C3, depending only on the compact IK and A;, as required. □ 



Lemma 3.12. Under the assumptions of Theorem \l.l[ (flbl) of Theorem \2. 1 
holds. 



Proof. Instead of (|lbp we shall verify the sufficient condition 

Pe„ (^Zt{u) > e"''l"l/^j < for any integer m > 1. (3.24) 
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Indeed, by the Cauchy-Schwartz inequality 

Ee„7^M< Ee„l{ZH«)>e-^l/4}A/^TR + e-'^H/8 < 



and since the latter is required to hold for any m > 1, (jlb[) of Theorem 12.11 
holds as well. 

The formula p.7p (with ^ and swapped) implies 

rvrf - rvrfo = r /i*2?<l(7rf ) • (A*(0) - A*(^o))vr?d., 

JO 

and as A(^) has a continuous second derivative A"{6) 

h*4 - h*4° = {e- Oo) f h*DTTl\{Kl) ■ A'*(eo)vrfds+ 

Jo 

\{e- Oof 1^ h*D^%{^l) . A"*{e)7r<^Js 

=: {6 - 9o)at{9o, 0) + {6 - OofPtiOo, 6) 

with 6 G [0{)^9]. Due to the property (|3.6p . sup£>o E5/y|a£(0o5^)| < co and 
supj>o E0o|/3t(6'o,6')| < oo, hence 

5t(0o, ^) = - eo)'Eeo (at(0o, ^))' + o((0o - 9f) , (3.25) 
where o(-) is uniform in t > 0. Note that at(6lo,6'o) = h*TTl° and 

at(0o, ^o) - at(0o, ^) = /* r I^vrf" (^f«) • A'*(0o)vrf»ds- 

/\*I?7rf°(^f).A'*(0o)vrf^i.= 

JO 

rZ^vrJ, (vrfo) . A'*(0o)(vrf° - 7rf)d.+ 

h* [d7:%{4°) - Dvrf° (vrf)) • A'*(0o)vrfd. 
Now using the formula (13. 6p and the bound ()3.13p . we find that 

supE,„(at(0o,^o) - at{eo,e)Y < c^{e^ - ef. 

t>0 ^ ' 
This and (j3.25p imply 

gt{d^, e) = {e- Oof Eg, {atiOo, eo)f + o((0o - ef) = 
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and hence, by the assumption ()a-3p and (13. Sp . 



5(^0, e) = hm gt{eo, 0) = {00 - 0f Km (/i*^^)' + o((0o - 0?) = 

{eo-0fl{0o) + o{{0o-0)^) 



t—*OD 



Thus for some smah enough r > 



g{0o,0) > -I{eo)i0o-0?, y\0o-0\<r, 



(3.26) 



uniformly over 0q E K. Since q{r) := inf^^gK inf|0jj„g|>r (j((6'o, ^) is strictly 
positive by the assumption (|a-2p . we have 

uniformly over £ IK.) where I©! denotes the diameter of 0. Hence, with 
K := r A q{r) > 0, 

In particular, we have Tg{0Q,0'^) > ku^, whenever u belongs to a compact 
in Vt- Further 



rT 



T 



el 



1 



T 



h* ( TTj" - <o \dBt-- I ( /i*^;" - /i*vrr I c^t > --u 







, * si 



1 

2./0 



T 



.*eT 



^f" - /iVf')' - 5(^0, C)J dt >-ju' + ^g{0o, 0l) ) < 



K 2 T 







dt >-u^\< 



Bo 



h^nf -h*7Tf"Y -g{0o,0^) 



dt 
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Now by the Chebyshev inequality, ()3.13l) and using bounds for the moments 
of stochastic integral (see e.g. [24] ) 



>- -s"') ^ 

2m 



-00 



2m 



2m 



— im(2m-l)rT 



^rpm-l II 1, 1 1 2m 







2m 



T 

2 m 



< 



dt < 



(m(2m-l))'"r 



m^m-l||^||2m^^^'" 



u 



2m ■ 



Using the estimate p.l4p . 







5(^0,0 



dt 



> -u'\ < 



AT 



2m 



T 



2m 



dt 



< 



2m 



2 ) 



I 2-3/2 I 



2m 



+ 



J^2n 



< 



2 m 



2m ( 



2m 



) 



Cs 



u 



2m ■ 



where we used the fact \u/Vt\ < \0\ (the diameter of 0). This verifies 
(j3.24p and thus the statement of the lemma. □ 

Lemma 3.13. Under assumptions of Theorem the finite dimensional 
distributions of Zt{u) converge weakly to those of the process 



1 



Z{u) = exp y/lWX - 0^(00 



u E 



uniformly in 6q G K, where C, is a standard Gaussian random variable (i.e. 
([2]) of Theorem \2.1\ holds). In particular, 



u := argmax. 



■uGl 



j.Z(n) 



is a zero mean Gaussian random variable with variance l/I{9o) (i.e. ([3|) of 
Theorem \2. 1\ holds as well). 



Proof. Recall the definition of the process Zt{u) 



T 



Zt{u) = exp <( / (rvr^"" - h* tt'^") dB-, 



1 



h*TTf'fdt 
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Using ()3.7I) , ()3.6p and (13. lip , similarly to the proof of (I3.26P , we have 



/ I)<U7rf«+0 • (A*(eo + 5) - A*(^o))7rf°+^d.- 

^0 



0(5^) 



uniformly in t > 0. Hence 







which implies 



and in turn, by ()a-3p 



do 



(it 



T^oo 



lim 



do 



dt = u^Ii9o), 



uniformly on compacts £ 0. By the CLT for stochastic integrals (see e.g. 
Proposition 1.20 in [19]), 



h*7T 



el 



h*TT^/')dBt 



converges weakly to a Gaussian random variable with zero mean and vari- 
ance u'^I{6o), uniformly on compacts £ This implies the weak con- 
vergence of the one (and all finite) dimensional distributions of Zt{u) to 
Z{u). By (j3.13p . I{6o) is finite and assuming that it is positive uniformly on 
compacts in 0, the maximizer of Z{u) is unique and equals u = C/\/ ^(^o) 
as claimed. □ 

4. An example 

In this section we demonstrate with a simple example, how the conditions 
of Theorem 11.11 can be verified explicitly. Let St be a Markov chain with 
values in {0, 1}, initial distribution P(5'o = 1) = v and transition matrix 



K = e 



where is an unknown parameter, which controls the switching rate of the 
chain. Suppose, it is known that the actual value of this parameter lies 
within an interval O := (^miru^max) C ffi+, ^max > ^min > 0. The chain is 
observed in the Gaussian white noise channel, i.e. 

ft 



Srdr + Bt, t>0. 
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The filtering process vrf = P{St = in this case satisfies the SDE: 

dvrf = ^(1 - 2TTf)dt + vrf (1 - nf) {dXt - nfclt) , t G [0, T] (4.1) 
started from ttq = u. The likelihood function is 

LT{e;X^) = exp ^?dXt (vr?)'dt| . 

The MLE of 6 is found by maximizing Lt{0; X^) over 9^0= [^mim ^max]- 
The condition (la-ip is satisfied and we should check (la-2p and (ja-3p . 

4.1. The identifiability condition ()a-2p . Let (7r^",7ff) be a stationary 
(under P^q) copy of the process defined by 

dT^f' = 0o(l - 2^4° )di + 7rfo(l - TTf°)dBt, 

dvrf = 0(1 - 27rf)dt + (1 - Trf){dBt + (vrf' - 7rf)dt) 

where d5j = — vr^^dt is the corresponding innovation Brownian motion 
with respect to VfrlTTQ"}. Introduce an auxiliary process qf, solving the 
equation 

dqf = 9{1 - 2qf)dt + qf{l - qf)dBt, 

subject to (/q = ttq. Heuristically, it is clear that if \7r^° — vrf | is small on 
average for |0— ^ol ^ r > , then the distribution of fcf should be close to the 
distribution of . But the latter satisfies an Ito equation, corresponding to 
the filtering problem for the signal with the switching rate 6. This, in turn, 
would imply that the signals with well separated 9 and can be filtered 
with the same steady state error. The latter can be argued false in our case 
and hence nf' and fcf cannot be close. The rest is the precise realization of 
this heuristics. 

The difference := fcf — qf solves 

dAt = -29Atdt + atdt + At{l - Trf - qf)dBt, Aq = 

where = vrf (l — Trf^(n^° — vr|*) and hence Vt = E^gA^ satisfies 

Vt = -A9Vt + 2Ee,Atat + ^eAli^ " ^! " Q^tf < CiVt + Ca^E^. 
This implies 

and hence 

Ee, {nt" - ,ff£2E^- irff + 2E,„ (vrf -qff< 

+ 2| (e^^* - 1) VE.oK°--f)' < 

p(t)^E,„(7rf«-7rf)' (4.2) 
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with p{t) := 2 + 2^(e'^i* - l) > for any t > (regardless of the sign of 
Ci). On the other hand, by the Jensen inequahty 



< 2E, 



■So Ft 



<2 



Qt) ■ 



The distribution of qf converges weakly to the stationary distribution of vrf 
under Pq (and not PegO) i-^- to the distribution of fcf under Pg. Thus for 
any e > we may choose T(e) large enough such that 

\Ee,{qty-Ee{nf)'\<e, yt>T{e). 

The distributions of vr^" (under Pq^) and of vrf (under Pg) can be found 
explicitly by solving the corresponding Kolmogorov equations (see e.g. Sec- 
tion 15.4, [25]) and Eeo(^f')^ / Ee(7rf)^ whenever / 6*0, is checked by 
direct calculation. Moreover, £0(71"^)^ is easily seen to be continuous in 6 on 
[6*111111, 6'max], and thus 

But then by for any \6o - e\ > r > 



..0 



> 



Ee,, 1 vr: 



Eeo(gf; 



16p2(t) 
1/8 



> 



Eeo I TT- 



1^4/ 



> 



16p2(T(e)) ^ 16p2(r(e))- 

The required property (|1.4p now follows by arbitrariness of e and positiveness 
of p{t), t > 0. 

Remark 4.1. The coupling argument, used in this example, is applicable in 
the general d > 2 case, namely Eq, (/i*7rf« - /i*7rf )^ can be similarly shown to 
be lower bounded by a quantity, proportional to \Eq^ f (^h* — Egf(^h*7rf^ | 
with /(x) = or f{x) = x, etc. The latter means that the stationary 
laws of two d-dimensional diffusions are to be studied, instead of the law of 
2(i-dimensional diffusion (7rj°,7ff). In particular the identifiability follows, if 
one is able to show that the laws of /i*7rj° under Pg^ and h*'Kf under Pg have 
different moments, uniformly for separated 6 and Oq. In the example, this 
was possible due to explicit expression available for the probability density 
of the filtering process when d = 2. 

4.2. The regularity condition (|a-3p . The derivative tt^" satisfies the equa- 
tion 



d^'^o = (1 - 2Trt° )dt 



20o + vr^(l-<"))'^rf^i+ 



(l-27rf')7^,^°(i5t, ^|;o = 0. (4.3) 
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and hence the pair (7r^°,7r^°) is a Markov- Feller process. The formula (|3.7p 
yields 

7rf« = -2 / Z)7r5(7r^)7rf»ds, 
Jo 

and, in turn, the bound (j3.6p implies 

supEe„(^f°)' < cx). (4.4) 

This guarantees existence of at least one invariant measure for (vr^" , ttj " ) 
(e.g. Theorem 2.1, Ch III, [Uj). The uniqueness of this measure as well as 
the limits, required in (|a-3p . are deduced by standard arguments from (|4.4|) 
and the fact that the distance between any two solutions of (14. 3p . started 
from distinct initial conditions, converges to zero with positive asymptotic 
exponential rate. Let (vr^^jTrj") be the stationary pair, then (j4.3p implies 







re-2^^°(*-^)(l-27rf°)^f°(iS„ 
Jo 



and hence 



^ e"2««(*-^)(l - 27rf«)(is^ < 4(1 + e-^^o*)/(eo)+ 

fe~'^^^^'-'^—I{eo)ds + A r e-4^o(*-^)/(eo)ds < C^i/(^o), 
7o 16 7o 



4 /"* „^fl„^/_„^ 1 
26*0 Jo 

with a constant Ci > 0. The process Ct ■= /g e^2^o(*^^)(l - '^T^t°)(^^ is the 
solution of the equation 

Ct = -29oCt + {l-2TTf°), Co = 0. 

Elementary calculations show that 

2E,,(frg°-l/2)^ 

which is positive for any positive ^o- Hence inf^^^g^ -^(^o) > as required in 

5. A DISCUSSION 

The result stated in Theorem 11.11 is extendable to the vector parameter 
space 0, since the key properties such as (13. 3p . (13. 6p and (13. 7p do not depend 
on the dimension of 6. On the other hand, the setting where h depends on 
the parameter, seems to be more delicate and would require additional effort, 
mainly because the formula analogous to (13. 7p in this case is more intricate 
and involves Skorokhod anticipating integrals (see Proposition 4.1 in [5]). 
As was mentioned before, the requirement (|a-ip is essential and it is not 
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obvious whether the claimed results hold under weaker form of ergodicity 
of the chain S (especially the convergence of moments). The requirements 
(ja-2p and (ja-3p seem to be quite natural, though it is not clear at what level 
of generality they can be verified in terms of the model data. 

Appendix A. An LLN for processes with short correlation 

The following is a version of Lemma 2.1 in [16], adapted to our purposes. 
The proof mostly follows the lines of the original proof. 

Lemma A.l. Let (we shall also write for brevity) be a sto- 

chastic process with real values, depending on a parameter b > 0, such that 
E<^{t;b) = and 

E|$(t;6)P < C76"\ m = l,2,... (A.l) 

for a constant C , possibly depending on m. Let G < ti < t2 < ■■■ < tn, and 
assume that for any i G {1, n} 

|E$(ti)...^>(t„) - E^ti)..Mti)^Hti+i)-Htn)\ < C„6"a(ti+i - t,) (A.2) 
with a nonnegative decreasing function a(r), such that 

■= / T"~^a(r)(ir < OO, n = l,...,k. 
Jo 

Then 

[ ... f \E^(si)...^{s2k)\dsi...ds2k<C2kb''''T\ 
Jo Jo 

where C2k constants, depending only on k and ^i,..., A^. 
Proof. The lemma is proved by induction. The bound ()A.2p implies 

f-T r-T r-T f-t 

/ |E^>(si)$(s2)|c^sids2 < / a{s)dsdt <2C2b'^AiT, 

Jo Jo Jo 

and hence the claim holds for A; = 1. Suppose now that the lemma has been 

proved for k < n. Let s = (si, S2n+2) G [0, T]^"+^. Let ji, i2n+2 be the 

permutation of the indices such that sj-^ < — ••• — ^j2n+2 ~ '''i^) 

be any index for which 

niaX (Sj2f+i — -Sjaf) = ■Sj2r + l ~ ^j2r- 

t=l,...,n 

From fA2]) it follows that 

E$(si)...^>(s2n+2) - E^>(si)...$(Sj2jE$(Sj2^^J...^>( 

Cb^''+Msj2r+^ - sjj (A.3) 

Since $(t) is zero mean, ()A.2p implies 
|E$(si)...$(s2„+2)| <C762-+2 
|E$(.,2,^J...$(.,,„^2)| < C62«+2-2.^(^^.^^^ 
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2r 



Moreover, by the Holder inequality and ()A.1|) . E|$(si)...<l>(sj2^)| < Ch 
and thus (hereafter c is a constant, possibly depending on n, whose value 
may differ from line to line) 



E^>(si)...$(s2n+2) - E$(si)...^>(Sj2jE^>(Sj2,^J...$(Sj2„+2) < 

As a(T) decrease, the latter and ()A.3P imply 

E^>(si)...$(s2„+2) - E$(si)...«>(Sj2jE^>(Sj2^^J...$(Sj2„+2) < 

2c62"+2a( max(sj2„^2 - s^^^+i , Sjj^+i - s^-^ J) (A.4) 

By the definition of r 

(T(s) := max(Sj2„ + 2 ~ ^ 32^+1^ ^j2r+l ~ ^j2r) ^ 

1 

and as a^r) decreases, we have 
rT f-T 



JQ 



a[a{s))ds < 

(2"- + 2)! ^{^^{j2^Shi+l-Shi) + iSj2n + 2-Sj2n + l))}ds. 



0<si<...<S2„+2<T 



Using the formula 

/ ... / a{ti + ... + tn+i)dti...dtn+i = —; / u^a{u)du, 
Jo Jo ri\ Jq 



the following estimate is obtained: 

rT rT 



[ ... I a{a{s))ds<CnAn+iT^+\ (A.5) 
Jo JO 



where depends only on n. Further 
rT f-T 



Jo Jo 

/ - / |E^(Sjl)-^(Si2j||E^(«i2.+l)-^(Si2„+2)|'^S < 
^^j^ JO JO 

(2n + 2)!V / ... / |E$(si)...$(s2£)|(isi...(is2^x 

JO JO 

/ ... / \E^{s2l>+l)...^{s2n+2)\ds2l+lds2n+2 

Jo Jo 
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PAVEL CHIGANSKY 



By the induction hypothesis, each of the terms in the sum on the right hand 
side are bounded by C2nb'^"'~^'^T"'~^^ and hence using ()A.4p . (IA.5|) we obtain 
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